[PULP-1118] Add better error handling for repover duplicate content #7280

pedro-psb · 2026-02-04T14:41:04Z

Added a proper error class for duplicate content handling and some more logging to inform exactly what are the conflicting content.

When duplicates are detected, we do some extra work to collect duplicate content.
A simple performance test shows it's not too bad:

In [1]: import pulpcore.plugin.repo_version_utils as ut

In [2]: content_qs = Package.objects.all()

In [3]: unique_keys = Package.repo_key_fields

In [4]: content_qs.count()
Out[4]: 24547

In [5]: ut.count_duplicates(content_qs, unique_keys)
Out[5]: 7701

In [6]: %timeit ut.count_duplicates(content_qs, unique_keys)
35.1 ms ± 154 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [7]: %timeit ut.collect_duplicates(content_qs, unique_keys)
61.4 ms ± 266 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Closes: #7184

📜 Checklist

Commits are cleanly separated with meaningful messages (simple features and bug fixes should be squashed to one commit)
A changelog entry or entries has been added for any significant changes
Follows the Pulp policy on AI Usage
(For new features) - User documentation and test coverage has been added

See: Pull Request Walkthrough

dralley · 2026-02-04T15:26:19Z

pulpcore/exceptions/base.py

+    """
+
+    def __init__(self, duplicate_count: int, correlation_id: str):
+        self.dup_count = duplicate_count


It would seem like this assumes that a RepositoryVersion creation failed due to duplicates always? Do we want to assume that? Should the error be more specific?

Yeah, makes sense. I'll make more specific one

dralley · 2026-02-06T16:57:33Z

pulpcore/plugin/repo_version_utils.py

+def log_duplicate(pulp_type: str, duplicate: DuplicateEntry):
+    keyset_value = duplicate.keyset_value
+    duplicate_pks = duplicate.duplicate_pks
+    _logger.info(f"Duplicates found: {pulp_type=}; {keyset_value=}; {duplicate_pks=}")


Is there are particular reason to separate this into its own function if it's only used in one place?

Just for the abstraction. At a glance, it's trivial do read the main validate function and understand what it does, and dig into the functions if you are interested in implementation details.
If the call cost is not relevant (i believe it isnt here), then it is just a matter of style. I can undo that, if you prefer..

It's just a matter of style, but if only one line is doing work here, I'd rather just inline it personally

I do not consider it a blocker if you have strong feelings about it or if there are plans to re-use it in the future, though.

Ok, I'll inline the log one. I'll keep the count and collect duplicates because it was at least convenient for testing them in isolation.

dralley · 2026-02-06T16:57:56Z

pulpcore/plugin/repo_version_utils.py

+def count_duplicates(content_qs, unique_keys: tuple[str]) -> int:
+    new_content_total = content_qs.count()
+    unique_new_content_total = content_qs.distinct(*unique_keys).count()
+    return new_content_total - unique_new_content_total


This case is at least more sensible than the other one though, since there's more than one line that's actually doing something

Added a proper error class for duplicate content handling and some more logging to inform exactly what are the conflicting content. Closes: pulp#7184

mdellweg · 2026-02-09T08:20:20Z

pulpcore/plugin/repo_version_utils.py

+                keyset_value = duplicate.keyset_value
+                duplicate_pks = duplicate.duplicate_pks
+                _logger.info(f"Duplicates found: {pulp_type=}; {keyset_value=}; {duplicate_pks=}")
+    if dup_count > 0:


I think the logic is messed up a bit.
If you have two types in the loop and the first one has duplicates the second is fine, this will not raise.

github-actions bot added no-changelog no-issue labels Feb 4, 2026

pedro-psb force-pushed the fix/7184-report-conflicting-packages branch from 26e2e75 to 7e9e749 Compare February 4, 2026 14:42

github-actions bot removed the no-changelog label Feb 4, 2026

pedro-psb force-pushed the fix/7184-report-conflicting-packages branch from 7e9e749 to a42e341 Compare February 4, 2026 14:45

pedro-psb changed the title ~~Add better error handling for repover duplicate content~~ [PULP-1118] Add better error handling for repover duplicate content Feb 4, 2026

dralley reviewed Feb 4, 2026

View reviewed changes

github-actions bot added multi-commit no-changelog labels Feb 4, 2026

pedro-psb requested a review from dralley February 5, 2026 11:59

dralley reviewed Feb 6, 2026

View reviewed changes

Add better error handling for repover duplicate content

c43e01d

Added a proper error class for duplicate content handling and some more logging to inform exactly what are the conflicting content. Closes: pulp#7184

pedro-psb force-pushed the fix/7184-report-conflicting-packages branch from abc7a12 to c43e01d Compare February 6, 2026 17:25

github-actions bot removed multi-commit no-changelog labels Feb 6, 2026

pedro-psb enabled auto-merge (rebase) February 6, 2026 17:45

mdellweg reviewed Feb 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PULP-1118] Add better error handling for repover duplicate content #7280

[PULP-1118] Add better error handling for repover duplicate content #7280

Uh oh!

pedro-psb commented Feb 4, 2026 •

edited

Loading

Uh oh!

dralley Feb 4, 2026

Uh oh!

pedro-psb Feb 4, 2026

Uh oh!

dralley Feb 6, 2026

Uh oh!

pedro-psb Feb 6, 2026

Uh oh!

dralley Feb 6, 2026

Uh oh!

pedro-psb Feb 6, 2026

Uh oh!

dralley Feb 6, 2026

Uh oh!

dralley Feb 6, 2026

Uh oh!

mdellweg Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[PULP-1118] Add better error handling for repover duplicate content #7280

Are you sure you want to change the base?

[PULP-1118] Add better error handling for repover duplicate content #7280

Uh oh!

Conversation

pedro-psb commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📜 Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pedro-psb commented Feb 4, 2026 •

edited

Loading